[superlog] Downgrade ClickHouse timeout log from ERROR to WARN in tracking health checks by superlog-app[bot] · Pull Request #468 · databuddy-analytics/Databuddy

superlog-app · 2026-06-11T14:19:07Z

Summary

The isTrackingSetup dashboard endpoint occasionally logs ERROR when its ClickHouse query for blocked-traffic data exceeds the 10-second timeout. The endpoint itself recovers gracefully — it returns 200 with tracking_issue: null — so the ERROR is a false alarm that creates noisy incidents.

Both getRecentBlockedTrackingIssue and getTrackingEventsStatus share the same Promise.race timeout guard and the same mis-levelled catch block. The fix differentiates between the known timeout (logged as WARN) and genuinely unexpected errors (kept at ERROR), applied symmetrically to both functions.

An alternative approach would be to pass an AbortSignal to the ClickHouse client so the underlying query is cancelled when the race times out, rather than letting it continue running in the background — that would reduce ClickHouse load during busy periods but requires ClickHouse client support for cancellation.

Incident on Superlog

Was this PR helpful? Leave feedback — goes straight to the Superlog team.

Summary by cubic

Downgraded ClickHouse timeout logs to WARN in tracking health checks to stop false alarms. Unexpected errors remain ERROR; endpoint behavior is unchanged.

Bug Fixes
- Detect the known "ClickHouse query timeout" in getTrackingEventsStatus and getRecentBlockedTrackingIssue and log WARN; keep other errors at ERROR.

^{Written for commit 8f429e8. Summary will update on new commits.}

…cking health checks

vercel · 2026-06-11T14:19:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
dashboard	Ready	Preview, Comment	Jun 11, 2026 2:21pm
databuddy-status	Ready	Preview, Comment	Jun 11, 2026 2:21pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
documentation	Skipped		Jun 11, 2026 2:21pm

unkey-deploy · 2026-06-11T14:19:32Z

The latest updates on your projects. Learn more about Unkey Deploy

Name	Status	Preview	Inspect	Updated (UTC)
api (preview)	Ready	Visit Preview	Inspect	Jun 11, 2026 2:20pm

greptile-apps · 2026-06-11T14:22:03Z

Greptile Summary

This PR downgrade-classifies ClickHouse timeout log events from ERROR to WARN in the isTrackingSetup endpoint, matching the graceful-recovery behavior already in place (both callers return safe defaults on timeout). The fix is applied symmetrically to getTrackingEventsStatus and getRecentBlockedTrackingIssue.

The timeout string "ClickHouse query timeout" is a bare literal that appears in four places (two Error(...) constructors and two isTimeout string comparisons) with no shared constant, so a future typo or rename in one spot would silently break the distinction and fall back to ERROR.
The PR description itself notes that the background ClickHouse query is not cancelled when the race times out; no code change addresses this, and ClickHouse load during busy periods remains unchanged.

Confidence Score: 4/5

Safe to merge — the change only affects log levels in two catch blocks; the observable behavior of the endpoint is unchanged.

The fix correctly demotes noisy timeout alerts to WARN while preserving ERROR for unexpected failures. The only fragility is that the timeout sentinel string is duplicated four times as a bare literal across throw sites and catch comparisons, with no shared constant to keep them in sync. A future rename in one place would not be caught by the type system and would silently break the discrimination.

packages/rpc/src/routers/websites.ts — specifically the four inline occurrences of "ClickHouse query timeout" that would benefit from a shared constant.

Important Files Changed

Filename	Overview
packages/rpc/src/routers/websites.ts	Downgrade ClickHouse timeout catch blocks from ERROR to WARN in both getTrackingEventsStatus and getRecentBlockedTrackingIssue; detection relies on an exact string match against "ClickHouse query timeout" with no shared constant.

Sequence Diagram

sequenceDiagram
    participant Client
    participant isTrackingSetup
    participant getTrackingEventsStatus
    participant getRecentBlockedTrackingIssue
    participant ClickHouse
    participant Logger

    Client->>isTrackingSetup: GET /isTrackingSetup
    isTrackingSetup->>getTrackingEventsStatus: call
    isTrackingSetup->>getRecentBlockedTrackingIssue: call

    getTrackingEventsStatus->>ClickHouse: chQuery (analytics.events)
    getRecentBlockedTrackingIssue->>ClickHouse: chQuery (analytics.blocked_traffic)

    alt Query completes within 10s
        ClickHouse-->>getTrackingEventsStatus: result rows
        ClickHouse-->>getRecentBlockedTrackingIssue: result rows
        getTrackingEventsStatus-->>isTrackingSetup: EventsCheckResult
        getRecentBlockedTrackingIssue-->>isTrackingSetup: BlockedTrackingIssueRow or null
    else Timeout fires (10s)
        Note over getTrackingEventsStatus,getRecentBlockedTrackingIssue: Promise.race rejects with Error("ClickHouse query timeout")
        getTrackingEventsStatus->>Logger: logger.warn (was ERROR)
        getRecentBlockedTrackingIssue->>Logger: logger.warn (was ERROR)
        getTrackingEventsStatus-->>isTrackingSetup: hasEvents false, recentEvents 0
        getRecentBlockedTrackingIssue-->>isTrackingSetup: null
        Note over ClickHouse: Query continues running in background
    else Unexpected ClickHouse error
        getTrackingEventsStatus->>Logger: logger.error (unchanged)
        getRecentBlockedTrackingIssue->>Logger: logger.error (unchanged)
    end

    isTrackingSetup-->>Client: 200 tracking_issue null

Comments Outside Diff (2)

packages/rpc/src/routers/websites.ts, line 168-173 (link)

The timeout string "ClickHouse query timeout" is duplicated as a bare literal in both getTrackingEventsStatus (line 169) and getRecentBlockedTrackingIssue (line 249), and is now also matched character-for-character in two separate catch blocks. If this string is ever changed in one setTimeout call but not the other — or not in the corresponding isTimeout check — the catch will silently fall through to the ERROR branch, defeating the purpose of this fix. A shared constant would keep all four references in sync.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
packages/rpc/src/routers/websites.ts, line 248-253 (link)

Same duplicated literal issue in the second setTimeout — this should reference the shared constant too so both throw sites stay in sync.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "[superlog] Downgrade ClickHouse timeout ..." | Re-trigger Greptile}

[superlog] Downgrade ClickHouse timeout log from ERROR to WARN in tra…

8f429e8

…cking health checks

vercel Bot temporarily deployed to Preview – documentation June 11, 2026 14:19 Inactive

unkey-deploy Bot deployed to links - preview June 11, 2026 14:19 View deployment

vercel Bot deployed to Preview – databuddy-status June 11, 2026 14:19 View deployment

unkey-deploy Bot deployed to api - preview June 11, 2026 14:20 View deployment

vercel Bot deployed to Preview – dashboard June 11, 2026 14:21 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[superlog] Downgrade ClickHouse timeout log from ERROR to WARN in tracking health checks#468

[superlog] Downgrade ClickHouse timeout log from ERROR to WARN in tracking health checks#468
superlog-app[bot] wants to merge 1 commit into
mainfrom
superlog/downgrade-clickhouse-timeout-log-level

superlog-app Bot commented Jun 11, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

unkey-deploy Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

superlog-app Bot commented Jun 11, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by cubic

Uh oh!

vercel Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

unkey-deploy Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

superlog-app Bot commented Jun 11, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented Jun 11, 2026 •

edited

Loading

unkey-deploy Bot commented Jun 11, 2026 •

edited

Loading

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading